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In the Claims: 

Please cancel claims 1-30 and add the following new claims: 

31. (New) A multiple store buffer forwarding apparatus, comprising: 

a processor having a write combining buffer, and 

a non-volatile memory coupled to the processor, said non-volatile memory storing 
instructions which when executed by the processor cause the processor to: 

execute a plurality of store instructions referencing a first memory region; 

execute a load instruction referencing a second memory region; 

determine that the second memory region matches a cacheline address; 

determine that the first memory region completely covers the second memory 
region; and 

transmit a store forward is OK signal. 

32. (New) The multiple store buffer forwarding apparatus of claim 31, wherein the write 
combining buffer includes: 

a comparator to receive and compare an address of the second memory region with all 
existing cacheline addresses in the write combining buffer, 

an address and data buffer coupled to the comparator, 

a data valid bits buffer coupled to the address and data buffer, 

a multiplexer coupled to the data valid bits buffer, and 

a comparison circuit coupled to the multiplexer. 

33. (New) The multiple store buffer forwarding apparatus of claim 32, wherein the multiplexer 
is to: 

receive a byte valid vector from the data valid bits buffer, 

receive address bits from the load instruction, 

select a group of valid bits from the byte valid vector, and 
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output the group of valid bits. 

34. (New) The multiple store buffer forwarding apparatus of claim 33, wherein the comparison 
circuit is to: 

receive the group of valid bits; 

receive an incoming load instruction byte mask; 

determine that it is acceptable to forward the data using the group of valid bits and the 
incoming load instruction byte mask; and 

produce a forward OK signal. 

35. (New) The multiple store buffer forwarding apparatus of claim 31, wherein said processor 
is implemented as a multi-processor having associated with each said multi-processor a 
separate set of hardware resources. 

36. (New) A multiple store buffer forwarding apparatus, comprising: 

a memory; 

a processor coupled to said memory and having a write combining buffer, said processor 

to 

execute a plurality of store instructions referencing a first memory region of said 
memory; 

execute a load instruction referencing a second memory region of said memory; 

determine that the second memory region matches a cacheline address; 

determine that the first memory region completely covers the second memory 
region; and 

transmit a signal indicating that store buffer forwarding is authorized. 

37. (New) The multiple store buffer forwarding apparatus of claim 36, wherein the write 
combining buffer includes: 

a comparator to receive and compare an address of the second memory region with all 
existing cacheline addresses in the write combining buffer, 
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an address and data buffer coupled to the comparator, 

a data valid bits buffer coupled to the address and data buffer, 

a multiplexer coupled to the data valid bits buffer, and 

a comparison circuit coupled to the multiplexer. 

38. (New) The multiple store buffer forwarding apparatus of claim 36, wherein the multiplexer 
is to: 

receive a byte valid vector from the data valid bits buffer, 
receive address bits from the load instruction, 
select a group of valid bits from the byte valid vector, and 
output the group of valid bits. 

39. (New) The multiple store buffer forwarding apparatus of claim 38, wherein the comparison 
circuit is to: 

receive the group of valid bits; 

receive an incoming load instruction byte mask; 

determine that it is acceptable to forward the data using the group of valid bits and the 
incoming load instruction byte mask; and 

produce a signal indicating that it is acceptable to forward the data. 

40. (New) The multiple store buffer forwarding apparatus of claim 36, wherein said processor 
is implemented as multiple processors wherein a separate set of hardware resources is 
associated with each of said multiple processors. 
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